Clustering Expressive Speech Styles in Audiobooks Using Glottal Source Parameters

نویسندگان

  • Éva Székely
  • João P. Cabral
  • Peter Cahill
  • Julie Carson-Berndsen
چکیده

A great challenge for text-to-speech synthesis is to produce expressive speech. The main problem is that it is difficult to synthesise high-quality speech using expressive corpora. With the increasing interest in audiobook corpora for speech synthesis, there is a demand to synthesise speech which is rich in prosody, emotions and voice styles. In this work, Self-Organising Feature Maps (SOFM) are used for clustering the speech data using voice quality parameters of the glottal source, in order to map out the variety of voice styles in the corpus. Subjective evaluation showed that this clustering method successfully separated the speech data into groups of utterances associated with different voice characteristics. This work can be applied in unitselection synthesis by selecting appropriate data sets to synthesise utterances with specific voice styles. It can also be used in parametric speech synthesis to model different voice styles separately.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Evaluating expressive speech synthesis from audiobooks in conversational phrases

CNGL, School of Computer Science and Informatics, University College Dublin Dublin, Ireland {eva.szekely|mohamed.abou-zleikha}@ucdconnect.ie, {joao.cabral|peter.cahill|julie.berndsen}@ucd.ie Abstract Audiobooks are a rich resource of large quantities of natural sounding, highly expressive speech. In our previous research we have shown that it is possible to detect different expressive voice sty...

متن کامل

Unsupervised speaker and expression factorization for multi-speaker expressive synthesis of ebooks

This work aims to improve expressive speech synthesis of ebooks for multiple speakers by using training data from many audiobooks. Audiobooks contain a wide variety of expressive speaking styles which are often impractical to annotate. However, the speaker-expression factorization (SEF) framework, which has been proven to be a powerful tool in speaker and expression modelling usually requires t...

متن کامل

Creating expressive synthetic voices by unsupervised clustering of audiobooks

In this work we design an approach for automatic feature selection and voice creation for expressive synthesis. Our approach is guided by two main goals: (1) increasing the flexibility of expressive voice creation and (2) overcoming the limitations of speaking styles in expressive synthesis. We define a novel set of features, combining traditionally used prosodic features with spectral features...

متن کامل

Prediction of Emotions from Text using Sentiment Analysis for Expressive Speech Synthesis

The generation of expressive speech is a great challenge for text-to-speech synthesis in audiobooks. One of the most important factors is the variation in speech emotion or voice style. In this work, we developed a method to predict the emotion from a sentence so that we can convey it through the synthetic voice. It consists of combining a standard emotion-lexicon based technique with the polar...

متن کامل

Towards Glottal Source Controllability in Expressive Speech Synthesis

In order to obtain more human like sounding humanmachine interfaces we must first be able to give them expressive capabilities in the way of emotional and stylistic features so as to closely adequate them to the intended task. If we want to replicate those features it is not enough to merely replicate the prosodic information of fundamental frequency and speaking rhythm. The proposed additional...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011